Methods in Ecology and Evolution
○ Wiley
Preprints posted in the last 90 days, ranked by how well they match Methods in Ecology and Evolution's content profile, based on 160 papers previously published here. The average preprint has a 0.16% match score for this journal, so anything above that is already an above-average fit.
Casas Gomez-Uribarri, I.; Babayan, S. A.; Okumu, F.; Baldini, F.; Betancourth, M. P.
Show abstract
O_LIStandard survival analysis methods often rely on the assumption of proportional hazards (PH) or parameterisations of the survival function that might not be appropriate for wild populations. C_LIO_LITo enable survival analysis without these modelling constraints, we developed an approach that combines the Kaplan-Meier estimator with conditional probability theory to compute age-specific probabilities of survival up to some target age of choice{tau} . Marginalising this probability over the age distribution of the population yields O{tau}, the probability that a randomly sampled individual of unknown age will outlive the target age{tau} . Notably, the value for{tau} is set by the analyst for each group independently, which allows accounting for differences in pace of life across populations. C_LIO_LIWe tested its application using a simulation study and two real-world datasets, and compared its performance against that of Cox PH and parametric survival models. The PH assumption was violated in the three examples, rendering the Cox PH models inappropriate. Parametric models offered a better alternative, but the best parametric fit missed at least some key survival patterns in all examples. The TAUS model provided a valid description of survival patterns in all cases. Its richer output also allowed finer analysis of survival differences between populations. C_LIO_LIThe TAUS model is also available as an R package (https://github.com/casasgomezuribarri/TAUS). This new approach to survival analysis without PH or parametric assumptions allows the comparison of survival probabilities across populations with different age structures and rates of pace of life. This makes it suitable for a wide range of ecological applications, including in population viability analysis, epidemiology, or life-history theory C_LI
Abramov, K.; Galai, G.; Biton, B.; Puzis, R.; Pilosof, S.
Show abstract
O_LIEcological communities are complex and exhibit considerable spatial variability, presenting challenges in accurately understanding these systems. A primary obstacle in ecological research is the existence of missing links between species: inevitable unobserved interactions that limit our comprehension of ecological networks and their response to change. While link prediction methods have been developed to address this challenge, most approaches overlook the intrinsic spatial variability of ecological systems. C_LIO_LIWe introduce a flexible, spatially explicit framework based on matrix decomposition that leverages latent structural patterns to predict missing interactions and their strength, without requiring species traits or environmental data. The framework integrates information from paired auxiliary and target networks (locations) using thresholded SVD for link prediction. We applied it to plant-pollinator networks across the Canary Islands, performing pairwise predictions between locations, comparing them to within-location predictions (as a control), and quantifying how spatial variability influences predictive performance. C_LIO_LIPredictions revealed that latent network structure contains substantial predictive information, with F0.5 scores consistently exceeding a random baseline (mean F0.5 = 0.67 {+/-} 0.02 SD), while being less sensitive to interaction strength. The method enabled identifying plausible gaps in the data and producing ecologically coherent predictions. Incorporating information from auxiliary locations enhanced predictive accuracy in certain cases, but success depended on spatial context: predictions were most reliable when derived from nearby, ecologically similar locations, and declined with increasing geographic and ecological distance, consistent with a distance-decay effect. C_LIO_LIWe conclude that the predictability of missing links is spatially variable, reflecting both network and species-level heterogeneity. These patterns provide insights into network structure and the ecological processes shaping it, complementing trait-based approaches. While network structure offers rich predictive information, spatial context is essential for applying it effectively: ignoring spatial variability can obscure ecological signals and inflate predictive error. Our framework is computationally efficient, transferable, and readily applicable to any system with spatial or temporal replication. It can be used for a variety of ecological contexts, including island systems, fragmented landscapes, and environmental gradients, making it a practical and scalable tool for advancing link prediction in ecology. C_LI
Tabell, O.; Moser, N.; Ovaskainen, O.; Karvanen, J.
Show abstract
O_LIStatistical methods related to causal inference are fundamental in ecological research as ecologists often deal with causal research questions. Consequently, recent years have seen an increase in articles discussing causal inference in ecological context. However, generalizing causal findings across ecological systems that differ in environmental context still remains a challenge. While we may assess causal relationships in one location or population from experimental or observational data, replicating these findings in different settings can be impractical, expensive, or sometimes impossible. C_LIO_LIWe introduce causal effect transportability to ecological research - a formal framework for transferring causal effects assessed in one domain (the source) to estimate outcomes in different domain (the target), where broader data collection may be infeasible. Using structural causal models, this framework provides formal criteria for determining when causal effects can be validly transferred between populations and derives appropriate statistical adjustment formulas when the transportation is possible. Recent algorithmic developments, implemented in accessible R software packages, automate the mathematical derivations and make transportability analysis more practical for ecologists. C_LIO_LIWe demonstrate the framework through a case study examining the effect of tree canopy cover on dissolved oxygen concentrations across different watersheds. We succeed to show that transported estimates outperform naive applications of source population models. C_LIO_LICausal effect transportability offers critical tools for predicting ecological responses across heterogeneous settings, with particular relevance when experimental replication is constrained by cost, ethics, or urgency, and when management decisions require extrapolating findings to novel environmental contexts. C_LI
Boncourt, E.
Show abstract
The global expansion of grey wolf (Canis lupus) populations, particularly in Europe, underscores the need for robust tools to study their social structure, territory use, and genetic relatedness. Wolf packs are dynamic, evolving through dispersal, mortality, and reproductive success, and their accurate identification is crucial for effective conservation and conflict mitigation. Traditional methods for estimating wolf populations and pack structures--such as snow tracking or howling surveys--are labor-intensive and often unreliable. Noninvasive genetic sampling and spatial capture-recapture models have improved monitoring, but integrating genetic and spatial data remains a challenge. We introduce WolfPackR, an R package designed to integrate genetic relatedness and spatial data for identifying wolf packs, lone individuals, and spatially isolated but genetically linked "ugly ducklings." WolfPackR uses pairwise relatedness estimators to define genetic groups and refines these groups through spatial overlap analysis based on Minimum Convex Polygons (MCPs). The package provides a comprehensive toolkit for analyzing population structure, territoriality, and social organization, including functions for genetic grouping, spatial clustering, summary statistics, and interactive visualization. We demonstrate the utility of WolfPackR using a case study of 505 genotyped and geospatialized wolf scat samples from Romania. By combining genetic and spatial data, WolfPackR accurately identifies pack structures that align with expert assessments and family tree reconstructions. The package modular design and reliance on widely used R libraries (dplyr, igraph, sf, leaflet) ensure flexibility and ease of integration into existing workflows. While sampling heterogeneity may limit territory delineation in some cases, WolfPackR offers a cost-effective and reproducible framework for studying wolf pack dynamics, with potential applications for other social species.
King, B.
Show abstract
Simulation-based calibration (SBC) checking is a method to ensure that the inference machinery for a Bayesian statistical analysis is functioning in a correct and unbiased manner. Typically, SBC begins with sampling parameter values from the model priors (prior SBC). However, it has been shown that prior SBC can miss problems when these manifest only in certain regions of parameter space. In phylogenetics, this is relevant not only because of the vastness of tree and parameter space, but also because many phylogenetic analyses involve some degree of model misspecification. Posterior SBC is a recently developed method for checking that the inference algorithms function correctly for a given empirical dataset. Here I use posterior SBC to test the implementation of phylogenetic dating methods in the inference software BEAST 2. I test both the tip-dated approach, employing an Indo-European vocabulary dataset, and the node-dated approach, employing a molecular rRNA dataset of Tabanidae (horseflies). In both cases, posterior SBC tests indicate good calibration. Despite this, posterior predictive datasets simulated from the posterior distribution provided no further increase in the precision of node age estimates compared to the original posterior, a result consistent with previous literature showing fundamental theoretical limits to the identifiability of node ages. Nevertheless, these results suggest that phylogenetic dating methods in BEAST 2 are not biased by problems with the inference machinery, thereby increasing confidence in results obtained using these methods.
Kowal, J.; Upham, R.; Kiani, A.; Rickards, M.; Serpell, E.; Bidartondo, M. I.; Evangelisti, E.; Schornack, S.; Sibbit, J.; Treder, K.; Weidinger, S.; Suz, L. M.
Show abstract
O_LIRoot colonisation by endomycorrhizal fungi can indicate habitat condition. However, due to the significant time required to assess colonisation using traditional microscope techniques, studies of colonisation at large scales are impractical. AI-powered approaches may increase output and facilitate ecosystem assessments. C_LIO_LIWe trained our AI-powered tool MycorrhizaFinder (MFKew) on field roots from diverse ecosystems. It was trained to recognise a range of arbuscular and ericoid mycorrhizal fungal structures, and to differentiate dark septate endophytes common in field-sourced roots. C_LIO_LIHere we describe the semi-automated workflow from root processing and microscope slide scanning to model training and performance evaluation, proposing Macro F1 as the appropriate metric to be optimised. Without human supervision, Macro F1 currently stands at 66% for arbuscular and at 57% for ericoid mycorrhizal colonisation assessment. C_LIO_LIMFKew is user friendly, requires no programming skills and offers flexibility for advanced users who wish to further train the tool using their own labelled mycorrhizal root datasets, including images acquired from different devices or staining protocols. This adaptability allows users to customize the model for specific needs, making it optimal for ecologists and agronomists. Additionally, MFKew supports large-scale, repeatable, medium-throughput monitoring across ecosystems, enabling the assessment of mycorrhizal status and tracking changes over time. C_LI
Ketwaroo, F. R.; Muller, M. H.; Saracco, J. F.; Schaub, M.
Show abstract
O_LIDemographic processes in populations are inherently heterogeneous across both space and time. Many ecological models explicitly account for temporal heterogeneity in the demographic rates that govern these processes, but assume spatial homogeneity. Ignoring spatial heterogeneity can bias inference, limit predictive performance, and obscure key spatial structure in demographic rates. Integrated population models (IPMs) offer a powerful framework to estimate spatio-temporal demographic rates by combining diverse ecological data sources collected from multiple sampling locations. However, to accomplish this, IPMs face significant statistical and computational hurdles, including misalignment between different data sources and the need to efficiently account for residual spatial autocorrelation. C_LIO_LIWe present a novel Bayesian spatially explicit integrated population model (sIPM) which integrates population count and capture-recapture data from multiple sampling locations to estimate and predict continuous spatio-temporal demographic rates, such as survival, recruitment and population growth rate, across large geographic domains. This framework employs a joint likelihood approach with change of support to flexibly accommodate spatial and spatio-temporal data misalignment, and incorporates a nearest-neighbor Gaussian process to efficiently model residual spatial autocorrelation and generate spatial predictions. C_LIO_LIWe assess the performance of our sIPM through an extensive simulation study. Results show that our approach provides unbiased and precise estimates and predictions of spatio-temporal demographic rates, even in the presence of significant data misalignment and residual spatial autocorrelation. We demonstrate the utility of our method by analyzing data on Gray Catbirds (Dumetella carolinensis) from the North American Breeding Bird Survey and the Monitoring Avian Productivity and Survivorship program across the eastern coast of the United States from 2004-2014. This analysis results in maps of apparent survival, recruitment and population growth rate, thereby revealing important spatio-temporal variations in demographic rates that would have been obscured by traditional, spatially homogeneous IPMs. C_LIO_LIOur sIPM offers a robust and computationally efficient method for studying spatio-temporal variation in demographic processes across large areas, even in the presence of data misalignment and residual spatial autocorrelation. Ultimately, this framework, applicable to many ecological monitoring programs, facilitates the development of spatially targeted strategies necessary for effective conservation and management. C_LI
Miller, E.; Sanchez Reyes, L.; McTavish, E. J.
Show abstract
Birds are frequently used as a focal taxon for evolutionary and ecological studies. Thousands of papers a year are published using birds as study systems. Hundreds more are published clarifying the evolutionary relationships among clades or regionally circumscribed sets of bird species. Up to date phylogenies are essential for informing and guiding avian studies, controlling for the expectation of shared trait evolution, and for science communication, among other applications. However, employing up to date phylogenies to address these questions has proven challenging for a number of reasons. First, individually published phylogenies are often hard to access in a usable manner. For example, sequences are usually made available, and an image of the phylogeny is published, but the actual phylogeny data product is often not digitally available. Second, published phylogenies often do not include all taxa of interest or have branch lengths in units of time. Third, taxonomic mismatches between phylogenies and existing datasets can complicate analyses. We address these issues by sharing an R package, clootl, that wraps together a new, complete, dated bird phylogeny with easy to use tools to extract trees for taxa of interest and sample over uncertainty. The phylogeny incorporates information from more than 300 individually published bird phylogenies. The R package includes tools to help appropriately cite these input studies. This software will enable users to smoothly integrate accurate evolutionary information into any analyses on birds.
Malerba, M. E.; Perez-Granados, C.; Bell, K.; Palacios, M. M.; Bellisario, K. M.; Desjonqueres, C.; Marquez-Rodriguez, A.; Mendoza, I.; Meyer, C. F. J.; Ramesh, V.; Raick, X.; Rhinehart, T. A.; Wood, C. M.; Ziegenhorn, M. A.; Buscaino, G.; Campos-Cerqueira, M.; Duarte, M. H. L.; Gasc, A.; Hanf-Dressler, T.; Juanes, F.; do Nascimento, L. A.; Rountree, R. A.; Thomisch, K.; Toledo, L. F.; Toka, M.; Vieira, M.
Show abstract
Passive acoustic monitoring (PAM) enables non-invasive sampling of wildlife across broad spatial, temporal and taxonomic scales. Its ongoing and widespread use has generated unprecedented volumes of acoustic data, shifting the primary bottleneck from data collection to the storage, processing, integration, and interpretation of PAM outputs. Although many software tools exist to address these challenges, differences in their design, scope, and usability often create fragmented and complex analytical workflows. To identify the key barriers and opportunities shaping the implementation of PAM surveys, we conducted a structured expert solicitation involving 30 international practitioners working across terrestrial and aquatic ecosystems. Experts identified and ranked their most critical pain points in current PAM workflows, spanning data storage, processing, and interpretation. The top challenge identified related to accurate species identification using deep learning and artificial intelligence (AI) models, especially in noisy soundscapes or for underrepresented taxa. Eight additional priority challenges included workflow fragmentation, limited availability of user-friendly analytical and visualisation tools, uneven access to software, manual validation bottlenecks, computational constraints, and difficulties in data handling, standardisation, and sharing. Participants also proposed practical mitigation strategies for these priority challenges, supported by step-by-step guidance to help overcome key barriers. Together, these insights provide a roadmap toward more scalable, open-access, and collaborative software systems, which are increasingly essential to realise the full potential of PAM in global biodiversity monitoring.
Howard-Spink, E.; Mircheva, M.; Burkart, J. M.; Townsend, S. W.
Show abstract
Many animals communicate using sequences of signals, but identifying recurrent, non-random signal combinations remains methodologically challenging. Collocation analyses are increasingly popular approaches for detecting which signals animals combine at rates greater than expected by chance. However, existing methods for animal collocation analysis face several limitations that reduce their statistical rigour: they lack uncertainty estimates, fail to control for non-independence in sampled data, and do not account for inflated family-wise error rates when identifying attraction among many different signal types. These limitations restrict the broader applicability of animal collocation analysis, including preventing robust comparisons of signal combination strength between cohorts (e.g. populations, sexes or age classes). We adapt a novel form of Multiple Distinctive Collocation Analysis using Pearson residuals (MDCA-Pr) that addresses these statistical limitations, and validate its use in animal communication research in three ways: first, using numerous simulated datasets of different sizes and levels of signal recombination; second, using simulated data to evaluate the performance of MDCA-Pr in intercohort comparisons, and third, by demonstrating how MDCA-Pr can be applied to compare the vocal sequences produced by male and female captive-living common marmosets (Callithrix jacchus). MDCA-Pr shows high sensitivity, including at small sample sizes, and generally low false-positive rates, which we further reduce by applying additional criteria for identifying attraction between signals. During intercohort comparisons, MDCA-Pr is conservative, with low false-positive rates, and statistical power increases with sample size. MDCA-Pr is a robust method for evaluating signal attraction in animal communication and enables accurate intercohort comparison of animal signal combinations. Significance StatementBy assessing the performance of MDCA-Pr on simulated animal-like data, we demonstrate that this method reliably detects signal combinations within and across animal cohorts, while overcoming statistical limitations of previous collocation analyses. We present an analytical pipeline for applying MDCA-Pr to animal signal data, including for intercohort comparisons, enabling identification and comparison of combinatorial strategies across entire signal repertoires. We illustrate this approach by comparing call combination strategies of male and female common marmosets when presented with food under experimental conditions, finding similar combinatorial strategies between sexes. MDCA-Pr therefore permits rigorous characterization of animal signal combinatoriality and opens avenues for investigating how demographic, social, and group-level factors influence combinatorial patterns.
Ma, Z.; Ellison, A. M.
Show abstract
O_LIDiversity and heterogeneity are related but distinct and often conflated concepts. Diversity quantifies the number or relative abundance of discrete objects (e.g. species), whereas heterogeneity includes interactions among them (i.e. in networks) and between them and their environments. Although estimation, testing, and inference of diversity is well established and understood in ecology, comparable methods for heterogeneity are themselves diverse and rarely applied consistently or coherently. C_LIO_LIWe propose a consistent and coherent methodology for estimation, testing, and inference of heterogeneity of ecological networks. Estimation of heterogeneity is scalable from individuals to populations using the variance-to-mean (V/M) ratio and extensions of Taylors power law (TPL) to analyzing networks. Bootstrapping is used to partition heterogeneous and random clusters, whereas permutation tests are used to compare individual- and network-level heterogeneity. Inference includes the identification of "important" (e.g. dominant, foundation, keystone) species and "rich clubs" in heterogeneous networks, detection of biomarkers, and analysis of heterogeneity-stability relationships. C_LIO_LIWe demonstrate this methodology using the global Earth Microbiome Project dataset. The method could reliably distinguish heterogeneous nodes and networks; identified significant differences in heterogeneity among microbial assemblages in different habitats and in specific sites within habitats; and supported established principles of host filtering, species sorting, and niche partitioning. C_LIO_LIOur methods for estimation, testing, and inference of heterogeneity are modular, scalable, and applicable to a wide range of ecological systems. They also provide a quantitative method for understanding how evolutionary and ecological forces jointly shape both topology and heterogeneity in ecological networks. C_LI
Boehnke, D.
Show abstract
O_LIStandardising temperature data across heterogeneous study sites is essential for ecological meta-analyses, yet elevation-driven lapse rates often confound direct comparisons of coarse-grid climate data. Ecological studies frequently document only site altitude - particularly historical datasets - limiting analysis of thermal influences on spatial organism distribution. C_LIO_LIA dual-approach protocol was developed to derive regional correction factors ({Delta}H) from altitude-temperature regressions (Lapse Rate Method: SW Germany/Italian Alps, n=33 stations) and cross-regional station pairs (TAV Matching Method, n=27) with closely aligned long-term mean temperatures ({Delta}TAV [≤] 1.2{degrees}C). Applied to 109 Ixodes ricinus study sites across nine European regions, correction factors were calculated only for regions with consistent altitude shifts ({Delta}H > 100m) relative to Southwest German reference stations. C_LIO_LIRegional correction factors ({Delta}H) from both methods included +1300 m (Finland, TAV Matching), +370 m (Netherlands/NE Germany, TAV Matching), and -220 m (Italian Alps, Lapse Rate Method) across five regions. In total, 27 cross-regional TAV matched pairs demonstrated high matching precision (median {Delta}TAV = 0.05{degrees}C, 89 % [≤] 0.2{degrees}C). These factors standardised site altitudes to a common SW German thermal reference frame, enabling cross-site comparability. C_LIO_LIThe dual-method protocol requires no automation and is applicable to any taxa with documented site altitudes. The complete methodological workflow - including station data, lapse rate regressions, matching decisions, and correction calculations is publicly available at Zenodo [DOI 10.5281/zenodo.18835116], providing ecologists with a pragmatic, fully reproducible template for elevation-standardised temperature estimation in meta-analyses. C_LI
O'Sullivan, J.; Whittaker, C.; Xenakis, G.; Robson, T.; Perks, M.
Show abstract
Peatlands are an important terrestrial carbon sink which, when drained, can produce substantial CO2 efflux. Low productivity forestry planted on drained peatlands can become a net carbon source if losses from drained soils exceed sequestration by the trees. Decision support tools which assist resource allocation and intervention planning in forest-to-bog restoration are needed to mediate this substantial environmental harm. Predicting carbon mitigation benefits associated with forest-to-bog restoration is a major challenge, however, due to the lack of long-term monitoring programs and the fact that mitigation times depend on processes distant from the intervention. Here we introduce the PEATREST life cycle assessment (LCA) which predicts carbon fluxes associated with forest-to-bog restoration, including due to processes far from restored sites. The LCA estimates mitigation timescales defined as the time following intervention at which the restored peatland is predicted to sequester or store more carbon than the forestry would have if retained. HighlightsO_LIHere we develop a novel forest-to-bog Life cycle assessment (LCA) tool C_LIO_LIThe LCA predicts carbon mitigation times following peatland restoration C_LIO_LIThe model combines a variety of process-based and empirical sub-models C_LIO_LIExample implementations for two different restoration scenarios are explored C_LIO_LISensitivity analysis highlights the model inputs that most impact outcomes C_LI Graphical abstract(A single, concise figure that serves as a visual summary of the main research findings described in your manuscript.) O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=80 SRC="FIGDIR/small/715261v1_ufig1.gif" ALT="Figure 1"> View larger version (18K): org.highwire.dtl.DTLVardef@f243f5org.highwire.dtl.DTLVardef@14bc4c7org.highwire.dtl.DTLVardef@164261borg.highwire.dtl.DTLVardef@1db3b_HPS_FORMAT_FIGEXP M_FIG The PEATREST Life cycle assessment (LCA) generates compound time series of carbon sequestration and carbon storage for two scenarios: the forest-to-bog peatland restoration (PR) and a counterfactual (CF) of forestry retention. By comparing the two scenarios, the LCA predicts the carbon mitigation timescales (vertical dashed lines). These are defined as the time following harvesting at which the peatland is predicted to sequester more (emit less), or to have stored more (lost less) carbon, than the forestry would have if retained. C_FIG
Annells, A.; Breed, M.; Cavagnaro, T. R.; Hodgson, R. J.; Costin, S.; Davies, T.; Taylor, A. F.; Robinson, J. M.
Show abstract
Soil degradation threatens food security, climate regulation, biodiversity and human wellbeing worldwide. Up to 75% of the worlds soils are already degraded, and in response, global restoration efforts are rapidly scaling up to meet international targets. Monitoring soil biodiversity recovery remains a major barrier to tracking restoration success, particularly for invertebrates that underpin key processes including nutrient cycling and soil aggregation. Traditional sampling methods are labour-intensive, destructive and poorly suited to long-term or landscape-scale monitoring. Soil ecoacoustics is rapidly emerging as a promising non-destructive soil biodiversity monitoring tool. However, its capacity for taxonomic resolution of invertebrate groups remains untested. Here, we present a proof-of-concept study that establishes the potential for a soil invertebrate acoustic classifier. We used a low-cost, sound-attenuated recording system and quantified 19 spectral and temporal audio features from six morphologically and behaviourally distinct invertebrate species under controlled conditions. Acoustic profiles differed among taxa and generally clustered into podous and apodous groups (i.e., organisms with legs versus those without). Variation was driven primarily by taxon identity rather than body mass, suggesting that acoustic signatures capture taxon-specific traits. This work provides a foundation for developing automated acoustic classifiers that could enable scalable, non-destructive soil biodiversity monitoring.
De Marco, R.
Show abstract
This paper presents a six-stage methodological framework for Convolutional Neural Net-work (CNN)-based cetacean vocalization detection and classification in Passive Acoustic Monitoring (PAM), implemented as the open-source toolkit ai-pam-pipeline. The frame-work is generalizable across species and fully parameterised through a single configuration file, guaranteeing exact experimental reproducibility. Two experiments are reported. Experiment A examines the effect of FFT window length Nfft [isin] {256, 512, 1024} on binary Bottlenose dolphin (Tursiops truncatus) whistle detection using stratified 10-fold cross-validation on an in-domain dataset (Oltremare, 192 kHz) and a cross-domain benchmark (DCLDE 2022). In-domain performance is uniformly high (macro F1{approx} 0.98; Wilcoxon, all p > 0.05). Cross-domain results diverge substantially: Nfft = 256 is significantly superior (p = 0.006, rank-biserial r = 0.89). The mechanism is an upsampling amplification effect: coarser spectral bins produce wider, higher-contrast FM traces after bilinear resampling to fixed image dimensions. This superiority is threshold-invariant: precision equals 1.000 across all configurations and thresholds{theta} [isin] [0.1, 0.9], confirming that the advantage is not an artifact of threshold choice. These findings demonstrate that preprocessing choices -- often treated as secondary implementation details -- can significantly affect cross-domain generalisation. While Nfft serves here as a controlled case study, the framework is designed to enable systematic, reproducible evaluation of arbitrary preprocessing parameters within a unified experimental protocol. Experiment B demonstrates multiclass capability on five T. truncatus vocalization cate-gories (macro F1 = 0.843); inter-class confusion between click trains and burst-pulse sounds reflects biological signal overlap rather than classifier failure.
Kadlec, I.; Bartak, V.; Selimovic, A.; Kutal, M.; Dula, M.; Stier, N.; Meissner-Hylanova, V.; Peskova, L. B.; Sladecek, M.; Vorel, A.; Signer, J.
Show abstract
O_LIClassifying animal movement strategies from GPS tracking data is essential for understanding space use, population dynamics and conservation planning. However, existing approaches either require strong parametric assumptions about trajectory shape, large labelled datasets (i.e. expert-annotated) for machine learning, or lack formal uncertainty quantification. These limitations create barriers for researchers working with novel species or limited sample sizes. C_LIO_LIWe present a profile-based classification framework consisting of three steps. First, trajectories are segmented using breakpoint detection applied to Net Squared Displacement (NSD) time series. Movement metrics are then extracted from each segment and classified by comparing them to empirically derived behavioural profiles via Z-score distances transformed to softmax probabilities. Bootstrap resampling quantifies uncertainty in the resulting classifications from both training and test data. We validated the framework through simulation experiments and applied it to GPS tracking data from two ecologically contrasting species: gray wolf (Canis lupus;43 individuals) and northern lapwing (Vanellus vanellus;15 individuals). C_LIO_LISimulations showed that 5-10 training segments per movement strategy suffice for reliable classification, with overall accuracy of 91.1%across residential, floating and dispersal strategies. Segment duration of 30-60 days was required for confident discrimination of residential and floating behaviour. For wolves, the framework clearly distinguished residency, floating or dispersal (91.2%of segments classified with >50%probability). For lapwings, migration was identified with high confidence, while residential-floating discrimination reflected genuine ecological ambiguity confirmed by domain experts, with bootstrap confidence intervals transparently flagging uncertain cases. C_LIO_LIThe profile-based framework provides an accessible, interpretable alternative to parametric NSD fitting and machine learning approach, requiring modest training data while delivering probabilistic classifications with honest uncertainty estimates. An R package (moveprofile) implementing the complete workflow is freely available. The framework is applicable to any tracked species where distinct movement strategies can be identified by experts knowledge. C_LI
Smith, T. Q.; Szpiech, Z. A.
Show abstract
Pattersons D statistic, also known as the ABBA-BABA statistic, is widely used to detect the presence of archaic genome-wide introgression between two non-sister taxa. Requiring only a single lineage from each of four taxa where one taxon acts as an outgroup to determine the ancestral allele, Pattersons D, counts the imbalance between the number of biallelic sites where either the second and third taxa (ABAB site) or the first and third taxa (BABA site). When there is no introgression, these counts are expected to be equal, and a discordance between counts suggests introgression from the third taxon into either the first or second. Pattersons D is limited to the detection of genome-wide introgression and exhibits a high false-positive rate when applied to smaller genomic segments. Here, we present a new method, D STatistic with Allelic Rarefaction (D*), to address these limitations. D* uses multiple lineages and does not require an outgroup to calculate the imbalance between the number of alleles found exclusively in the second and third taxa and the number of alleles found exclusively in the first and third taxa. D* employs a rarefaction technique to correct for unequal sample-size and allows multiallelic sites. We use simulations to show that D* has better precision and recall for detecting introgressed segments of DNA when compared to similar methods under a wide variety of model parameters and in the presence of technical artifacts common to ancient DNA analyses. We conclude with an analysis of Denisovan DNA introgression in modern day Papuans. Precompiled executables, the manual, and source code can be found at https://github.com/TQ-Smith/DSTAR
Smeele, S. Q.; Hauer, C.; Bergler, C.; Dechmann, D. K. N.; Dietzer, M. T.; Elmeros, M.; Fjederholt, E. T.; Fogato, A.; Kohles, J. E.; Noeth, E.; Brinkloev, S. M. M.
Show abstract
O_LIBats are a diverse taxonomic group that display a wide range of interesting behaviours. Many bats are keystone species for their ecosystem, are IUCN Red-listed as vulnerable to critically endangered, and subject to human-wildlife conflicts arising from anthropogenic expansion. Yet bats remain understudied both with respect to behaviour, population ecology and conservation status. One of the major challenges when studying bats is obtaining data. Their nocturnal lifestyle and use of ultrasonic echolocation makes them difficult to track and record using traditional methods. Recent advances in passive acoustic monitoring have allowed researchers to record large amounts of data, but the detection and classification of vocalisations remain a challenge. Most available tools are either for profit or are limited to a narrow geographic range, and mostly focus on echolocation search phase calls. C_LIO_LIHere we present BatSpot, a convolutional neural network trained to detect search phase calls, buzzes and social calls. It also offers the option to classify the search phase calls to species(-complex) level. We provide a GUI that allows researchers to retrain or transfer-train the models for their specific needs and validate the performance. C_LIO_LIWe test the performance of all models and show that they perform better than both commercial and open-source solutions (search phase file level F1: 0.97 vs 0.96, buzz detector F1: 0.95 vs 0.11). We furthermore show that retraining the search phase call detector for a new country with examples from just 59 recordings massively improves the performance (F1: 0.48 to 0.79). C_LIO_LIBatSpot will enable bat researchers globally to automate detection and classification with minimal effort and includes novel options for social call and buzz detection, typically not featured in other automated tools for bat monitoring. C_LI
Swiston, S. K.; Kuehne, L.; Moore, R.; Landis, M. J.
Show abstract
Computational workshops are common in evolutionary biology and are used to share discipline-specific tools and skills with researchers. Despite the perceived importance of these workshops, there is no common set of criteria for workshop success, and there are few peer-reviewed studies investigating the efficacy of workshops or assessing the value of particular instructional techniques in this context. Here, we focused on one key element of a successful workshop: its ability to increase participants motivation to use the methods and tools presented during the workshop. We analyzed the goals, perceptions, and future plans of research practitioners engaging in a workshop on phylogenetic methods of historical biogeography using pre- and post-workshop surveys. Overall, the workshop was successful at motivating participants, and survey responses provided insights into participants perceptions of different activities, including "participatory live coding". Apart from this case study, we aim to highlight the importance of developing a common set of workshop goals in collaboration with other workshop stakeholders and the need for specialized, validated tools for assessing the efficacy of computational workshops for researchers.
Koshkarov, A.; Tahiri, N.
Show abstract
Phylogenetic trees represent the evolutionary histories of taxa and support tasks such as clustering and Tree of Life reconstruction. Many established comparison methods, including the Robinson-Foulds (RF) distance, assume identical taxon sets. A methodological gap remains for trees with distinct but overlapping taxa. Existing approaches either prune non-common leaves, which can discard information, or complete both trees such that they share the same taxa. Completion is more comprehensive, but current methods typically ignore branch lengths, which are essential for identifying evolutionary patterns. This paper introduces k-Nearest Common Leaves (k-NCL), an algorithm for completing rooted phylogenetic trees defined on different but overlapping taxa. The method uses branch lengths and topological characteristics and does not rely on a specific distance measure. The k-NCL algorithm is designed to preserve evolutionary relationships in the trees under comparison. The running time is O(n2), where n is the size of the union of the two leaf sets. Additional properties include preservation of original distances and topology, symmetry, and uniqueness of the completion. Implemented in Python, k-NCL is evaluated on biological datasets of amphibians, birds, mammals, and sharks. Experimental results show that RF combined with k-NCL improves phylogenetic tree clustering performance compared to the RF(+) tree completion approach. Availability and implementationAn open-source implementation of k-NCL in Python and the datasets used in this study are available at https://github.com/tahiri-lab/KNCL.